Skip to content

feat(func/greatest-least): new config param ignore null for func greatest/least#35239

Open
stephenkgu wants to merge 12 commits into3.0from
fix/6511294180
Open

feat(func/greatest-least): new config param ignore null for func greatest/least#35239
stephenkgu wants to merge 12 commits into3.0from
fix/6511294180

Conversation

@stephenkgu
Copy link
Copy Markdown
Contributor

Description

Issue(s)

Checklist

Please check the items in the checklist if applicable.

  • Is the user manual updated?
  • Are the test cases passed and automated?
  • Is there no significant decrease in test coverage?

Copilot AI review requested due to automatic review settings April 27, 2026 09:15
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ignoreNullInGreatest configuration parameter, allowing the GREATEST and LEAST functions to skip NULL arguments. The changes include documentation updates in both English and Chinese, the addition of the configuration flag to the client and server components, and the implementation of the logic within the scalar function execution engine. Two significant issues were identified in the implementation: a potential stack buffer overflow in greatestLeastImpl due to a type mismatch when reading the ignoreNull flag, and a risk of accessing uninitialized memory in freeSCovertScarlarParams if an error occurs during parameter conversion. New test cases have been added to verify the feature and its interaction with existing configurations.

Comment thread source/libs/scalar/src/sclfunc.c Outdated
Comment thread source/libs/scalar/src/sclfunc.c Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new client-scope configuration ignoreNullInGreatest to control whether GREATEST/LEAST skip NULL arguments, along with documentation updates and CI test coverage.

Changes:

  • Introduces ignoreNullInGreatest as a dynamic client config and wires it into function translation/execution by appending a runtime flag parameter.
  • Updates scalar execution for GREATEST/LEAST to optionally skip row-level NULL values and NULL literals when configured.
  • Adds comprehensive scalar-function tests (base coverage + ignore-null coverage) and registers them in CI; updates EN/ZH docs and config-scope references.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/ci/cases.task Adds the new scalar test cases to CI execution list.
test/cases/11-Functions/01-Scalar/test_fun_sca_greatest_least.py New base behavior test coverage for GREATEST/LEAST.
test/cases/11-Functions/01-Scalar/test_fun_sca_greatest_least_ignorenull.py New coverage for ignoreNullInGreatest behavior and interaction with compareAsStrInGreatest.
source/libs/scalar/src/sclfunc.c Implements runtime NULL-skipping behavior in scalar execution for greatest/least.
source/libs/function/src/builtins.c Appends the ignore-null flag as a trailing param so per-vnode runtime sees the client-scope setting.
source/common/src/tglobal.c Adds the new client configuration option and dynamic-update wiring.
include/common/tglobal.h Exposes tsIgnoreNullInGreatest global.
docs/zh/14-reference/03-taos-sql/22-function.md Documents NULL behavior and new config option for GREATEST.
docs/zh/14-reference/01-components/08-config-scope.md Lists ignoreNullInGreatest in config scope table.
docs/zh/14-reference/01-components/02-taosc.md Documents ignoreNullInGreatest under taosc parameters.
docs/zh/14-reference/01-components/01-taosd.md Adds ignoreNullInGreatest entry to the taosd parameter reference.
docs/en/14-reference/03-taos-sql/22-function.md Documents NULL behavior and new config option for GREATEST.
docs/en/14-reference/01-components/08-config-scope.md Lists ignoreNullInGreatest in config scope table.
docs/en/14-reference/01-components/02-taosc.md Documents ignoreNullInGreatest under taosc parameters table.
docs/en/14-reference/01-components/01-taosd.md Adds ignoreNullInGreatest entry to the taosd parameter reference table.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread source/libs/scalar/src/sclfunc.c Outdated
Comment thread source/libs/scalar/src/sclfunc.c
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 27, 2026 09:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

source/libs/scalar/src/sclfunc.c:6206

  • In the IsNullType early-return path, numOfRows can remain 0 when outputType is TSDB_DATA_TYPE_NULL because the loop breaks immediately ("if (IsNullType) break;"). This will return an output column with 0 rows even though the input params may have rows (e.g., scalar constants: 1 row; table scan: N rows). Compute numOfRows from the data inputs regardless of outputType, then fill that many NULLs.
  bool    IsNullType = outputType == TSDB_DATA_TYPE_NULL ? true : false;
  // Compute row count and detect all-NULL-typed input case.
  for (int32_t i = 0; i < dataInputNum; i++) {
    if (IsNullType) {
      break;
    }
    if (numOfRows != 0 && numOfRows != pInput[i].numOfRows && pInput[i].numOfRows != 1 && numOfRows != 1) {
      qError("input rows not match, func:%s, rows:%d, %d", __FUNCTION__, numOfRows, pInput[i].numOfRows);
      code = TSDB_CODE_TSC_INTERNAL_ERROR;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread source/libs/scalar/src/sclfunc.c Outdated
@stephenkgu
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ignoreNullInGreatest configuration parameter, which allows the GREATEST and LEAST functions to skip NULL arguments instead of returning NULL by default. The changes span documentation updates in both English and Chinese, client-side configuration handling, and core logic modifications in the function translator and scalar execution engine. Additionally, new test suites covering base functionality, NULL-skipping behavior, and performance benchmarks have been integrated into the CI process. Review feedback identified critical missing NULL checks for memory allocation and column data access within the scalar function implementation.

Comment thread source/libs/scalar/src/sclfunc.c
Comment thread source/libs/scalar/src/sclfunc.c Outdated
Copilot AI review requested due to automatic review settings April 28, 2026 06:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

source/libs/scalar/src/sclfunc.c:6221

  • In greatestLeastImpl(), when outputType is TSDB_DATA_TYPE_NULL (which translateGreatestleast can set when ignoreNullInGreatest=0 and a NULL literal is present), IsNullType is true at initialization and the loop breaks immediately, leaving numOfRows==0. This makes the function return 0 rows (and does not mark any output rows NULL) even though callers created the output column with the correct row count (rowNum). Suggestion: always compute numOfRows from the data inputs regardless of IsNullType (e.g., remove the early break or move row-count calculation before the IsNullType short-circuit), then set all rows NULL when IsNullType is true.
  bool    IsNullType = outputType == TSDB_DATA_TYPE_NULL ? true : false;
  // Compute row count and detect all-NULL-typed input case.
  for (int32_t i = 0; i < dataInputNum; i++) {
    if (IsNullType) {
      break;
    }
    if (numOfRows != 0 && numOfRows != pInput[i].numOfRows && pInput[i].numOfRows != 1 && numOfRows != 1) {
      qError("input rows not match, func:%s, rows:%d, %d", __FUNCTION__, numOfRows, pInput[i].numOfRows);
      code = TSDB_CODE_TSC_INTERNAL_ERROR;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@stephenkgu
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ignoreNullInGreatest configuration parameter, allowing the GREATEST and LEAST functions to skip NULL arguments instead of returning NULL when any argument is NULL. The changes include documentation updates, configuration system integration, runtime logic implementation in sclfunc.c, and comprehensive test coverage. Feedback includes a potential null pointer dereference in greatestLeastImpl during memory allocation failure and a suggestion to unify error handling using the _return label.

Comment thread source/libs/scalar/src/sclfunc.c
Comment thread source/libs/scalar/src/sclfunc.c
Copilot AI review requested due to automatic review settings April 28, 2026 07:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread source/libs/scalar/src/sclfunc.c Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 28, 2026 07:08
@stephenkgu
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ignoreNullInGreatest configuration parameter for the GREATEST and LEAST functions, allowing users to toggle between MySQL-compatible NULL propagation and a mode that skips NULL arguments. The implementation includes updates to the documentation, configuration handling in both client and server, and the core scalar function logic. Feedback was provided regarding potential memory leaks in the greatestLeastImpl function, where inconsistent error handling using SCL_ERR_JRET might bypass the necessary resource cleanup at the _return label.

Comment thread source/libs/scalar/src/sclfunc.c
@stephenkgu
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new configuration parameter ignoreNullInGreatest for the GREATEST and LEAST functions, allowing users to skip NULL arguments instead of returning NULL. The changes include updates to documentation, global configuration handling, the core implementation in greatestLeastImpl and vectorCompareAndSelect, and comprehensive test coverage. I have identified two areas for improvement: the numOfCols parameter type in vectorCompareAndSelect should be updated to int32_t for consistency, and an early return should be added to greatestLeastImpl when numOfRows is 0 to avoid unnecessary memory allocation and potential errors.

Comment thread source/libs/scalar/src/sclfunc.c Outdated
Comment thread source/libs/scalar/src/sclfunc.c
@stephenkgu
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ignoreNullInGreatest configuration parameter, which allows the GREATEST and LEAST functions to optionally skip NULL arguments. By default, the functions maintain MySQL compatibility by returning NULL if any argument is NULL; however, when this new flag is enabled, they compare only non-NULL values. The implementation includes updates to the configuration system, function translation logic to pass the flag to the runtime, and the core scalar execution engine to handle row-level NULL skipping. Additionally, the PR provides extensive documentation updates in both English and Chinese, along with new test suites for functional verification and performance regression testing. I have no feedback to provide.

Copy link
Copy Markdown
Contributor

@JinqingKuang JinqingKuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review 总结

审查发现 1 个 Critical2 个 High 级别问题,建议修复后合并。

统计

  • Critical: 1 个(内存安全)
  • High: 2 个(内存泄漏 + 可维护性)
  • Medium: 2 个(文档 + 测试覆盖)

详细问题见下方评论。

@JinqingKuang
Copy link
Copy Markdown
Contributor

Code Review 总结

审查发现 1 个 Critical2 个 High 级别问题,建议修复后合并。

统计

  • 🔴 Critical: 1 个(内存安全)
  • 🟠 High: 2 个(内存泄漏 + 可维护性)
  • 🟡 Medium: 2 个(文档 + 测试覆盖)

问题 1: 🔴 Critical - 空指针解引用风险

文件: source/libs/scalar/src/sclfunc.c:312-321

typeGetTypeModFromColInfo(&pInput[inputNum - 1].columnData->info) 未检查 info 字段是否有效。如果结构体未正确初始化,可能触发段错误。

修复建议

if (pInput[inputNum - 1].columnData == NULL ||
    pInput[inputNum - 1].columnData->pData == NULL ||
    pInput[inputNum - 1].columnData->info.type == 0) {
  qError("greatestLeast: invalid flag column, func:%s", __FUNCTION__);
  code = TSDB_CODE_TSC_INTERNAL_ERROR;
  goto _return;
}

问题 2: 🟠 High - 内存泄漏风险

文件: source/libs/scalar/src/sclfunc.c:378-424

如果 vectorConvertSingleCol 失败导致循环提前退出,outIdx 会小于 effectiveNum。此时 freeSCovertScarlarParams(pCovertParams, effectiveNum) 会访问未初始化的内存槽位。

修复建议

_return:
  freeSCovertScarlarParams(pCovertParams, outIdx);  // 使用实际填充的数量
  taosMemoryFree(resultColIndex);
  return code;

需要在函数开头将 outIdx 初始化为 0,并在 _return 标签前确保其值正确。


问题 3: 🟠 High - 复杂嵌套逻辑

文件: source/libs/scalar/src/sclfunc.c:270-299

vectorCompareAndSelect 中的 NULL 处理逻辑嵌套过深(3 层嵌套 + 多个条件分支),圈复杂度高,难以维护和测试。

建议重构

static int32_t vectorCompareAndSelect(SCovertScarlarParam *pParams, int32_t numOfRows,
                                      int32_t numOfCols, int32_t *resultColIndex,
                                      EOperatorType optr, bool ignoreNull) {
  if (ignoreNull) {
    return vectorCompareAndSelectIgnoreNull(pParams, numOfRows, numOfCols, resultColIndex, optr);
  } else {
    return vectorCompareAndSelectDefault(pParams, numOfRows, numOfCols, resultColIndex, optr);
  }
}

将两种模式拆分为独立函数,降低单个函数的复杂度。


问题 4: 🟡 Medium - 文档不一致

文件: docs/en/14-reference/01-components/01-taosd.md:9

英文文档只有一行简短描述,而中文文档(docs/zh/14-reference/01-components/01-taosd.md:68-77)有完整的参数说明(类型、默认值、最小值、最大值、参数类型、动态修改、支持版本)。

建议:英文文档补充完整的参数说明格式,与中文文档保持一致的结构。


问题 5: 🟡 Medium - 测试覆盖不足

文件: test/cases/11-Functions/01-Scalar/test_fun_sca_greatest_least.py

当前测试缺少以下边界场景:

  1. 空表 + 全 NULL 列的组合(effectiveNum=0numOfRows=0
  2. 超大列数(50+ 列)的性能和正确性
  3. 混合常量和列输入时的 broadcast 边界

建议添加

  • case_empty_table_all_null_columns()
  • case_many_columns_boundary() (测试 50+ 列)
  • case_mixed_constant_column_broadcast()

这些场景可以在后续 PR 中补充。

@stephenkgu
Copy link
Copy Markdown
Contributor Author

1, info.type 在 translate 的时候赋值,这里不是指针,即便数据被踩乱了也不会 segfault
2, 内存申请是 calloc, 不会造成 memleak
3, 这里 cyclomatic complexity 约 6~8, 无需重构
4, 中英文档整体格式不同
5, Three new boundary cases added to TS & test suite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants